Regret Minimization in MDPs with Options without Prior Knowledge

نویسندگان

  • Ronan Fruit
  • Matteo Pirotta
  • Alessandro Lazaric
  • Emma Brunskill
چکیده

Motivations I “Flat” RL : difficult to learn complex behaviours (eg, sequence of subgoals) ⇒ Humans abstract from low-level actions I Hierarchical RL : decompose large problems into smaller ones by imposing constraints on value function or policy I Possible implementation: options [Sutton et al., 1999] I Empirical observations: introducing options in an MDP can speed up learning but can also be harmful [Jong et al., 2008]. ⇒ Lack of theoretical motivation and understanding of options Theoretical analysis of learning with options I Adding options does not just reduce the space of stationary policies, the exploration is also greatly affected

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploration-Exploitation in MDPs with Options

While a large body of empirical results show that temporally-extended actions and options may significantly affect the learning performance of an agent, the theoretical understanding of how and when options can be beneficial in online reinforcement learning is relatively limited. In this paper, we derive an upper and lower bound on the regret of a variant of UCRL using options. While we first a...

متن کامل

A Regret Minimization Approach in Product Portfolio Management with respect to Customers’ Price-sensitivity

In an uncertain and competitive environment, product portfolio management (PPM) becomes more challenging for manufacturers to decide what to make and establish the most beneficial product portfolio. In this paper, a novel approach in PPM is proposed in which the environment uncertainty, competitors’ behavior and customer’s satisfaction are simultaneously considered as the most important criteri...

متن کامل

Generalised Entropy MDPs and Minimax Regret

Bayesian methods suffer from the problem of how to specify prior beliefs. One interesting idea is to consider worst-case priors. This requires solving a stochastic zero-sum game. In this paper, we extend well-known results from bandit theory in order to discover minimax-Bayes policies and discuss when they are practical.

متن کامل

Pricing Exotic Derivatives Using Regret Minimization

We price various financial instruments, which are classified as exotic options, using the regret bounds of an online algorithm. In addition, we derive a general result, which upper bounds the price of any derivative whose payoff is a convex function of the final asset price. The market model used is adversarial, making our price bounds robust. Our results extend the work of [9], which used regr...

متن کامل

Minimizing Regret in Dynamic Decision Problems

The menu-dependent nature of regret-minimization creates subtleties in applying regret-minimization to dynamic decision problems. Firstly, it is not clear whether forgone opportunities should be included in the menu. We explain commonly observed behavioral patterns as minimizing regret when forgone opportunities are present, and also show how the treatment of forgone opportunities affects behav...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017